A Parallel Corpus of Translationese
نویسندگان
چکیده
We describe a set of bilingual English–French and English–German parallel corpora in which the direction of translation is accurately and reliably annotated. The corpora are diverse, consisting of parliamentary proceedings, literary works, transcriptions of TED talks and political commentary. They will be instrumental for research of translationese and its applications to (human and machine) translation; specifically, they can be used for the task of translationese identification, a research direction that enjoys a growing interest in recent years. To validate the quality and reliability of the corpora, we replicated previous results of supervised and unsupervised identification of translationese, and further extended the experiments to additional datasets and languages.
منابع مشابه
Adapting Translation Models to Translationese Improves SMT
Translation models used for statistical machine translation are compiled from parallel corpora; such corpora are manually translated, but the direction of translation is usually unknown, and is consequently ignored. However, much research in Translation Studies indicates that the direction of translation matters, as translated language (translationese) has many unique properties. Specifically, ...
متن کاملQuantitative Analysis of Translation Revision: Contrastive Corpus Research on Native English and Chinese Translationese
Demand for Chinese-to-English translation has increased over recent years. In contrast, resources for training translators for Chinese-to-English are few although increasing now, relative to English-to-Chinese for example. Corpus-based techniques are now more widely acknowledged as being appropriate for the study of translation. A number of Chinese/English parallel translation corpora have been...
متن کاملImproving Statistical Machine Translation by Adapting Translation Models to Translationese
Translation models used for statistical machine translation are compiled from parallel corpora that are manually translated. The common assumption is that parallel texts are symmetrical: The direction of translation is deemed irrelevant and is consequently ignored. Much research in Translation Studies indicates that the direction of translation matters, however, as translated language (translat...
متن کاملOn grammatical translationese
In most published work on bilingual data from aligned corpora (variously called parallel corpora, translated corpora, bi-texts or corpora of parallel texts), it is the lexicon that has been studied (cf. Klavans and Tzoukermann (1990), Church and Gale (1991), Marinai et al. (1991), Isabelle et al. (1993)). In this paper, I am concerned with corpus-based contrastive studies of grammatical feature...
متن کاملStatistical Machine Translation with Automatic Identification of Translationese
Translated texts (in any language) are so markedly different from original ones that text classification techniques can be used to tease them apart. Previous work has shown that awareness to these differences can significantly improve statistical machine translation. These results, however, required meta-information on the ontological status of texts (original or translated) which is typically ...
متن کامل